-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run singularity-eos through clang address sanitizer... and fix the HIP segfault! #437
Conversation
…, clang undefined behavior sanitizer
@Yurlungur I think the issue here is that |
Thanks @rbberger . Right this makes sense. If we were passing by value, this wouldn't be a problem right? Its because we use forwarding references in our accessors, which allows the compiler to bind them as references. I wonder if this suggests that we should insist our accessors be passed by value. And we just assume reference semantics under the hood, which is how C-style arrays and views work anyway. |
… clang-sanitize builds, since flang is not super well supported yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I went through and cleaned up more warnings. I also added the clang-sanitizer test as a github workflow. This required adding some new cmake build options to support that. I made this test also fail if it encounters any warnings, which I think will help us catch problematic code going forward. The clang sanitizer checks address/memory errors as well as looks for undefined behavior.
I also snuck in some submodule updates (which shouldn't impact the spack builds at all).
@@ -359,14 +359,14 @@ PORTABLE_INLINE_FUNCTION Real Gruneisen::PressureFromDensityInternalEnergy( | |||
template <typename Indexer_t> | |||
PORTABLE_INLINE_FUNCTION Real | |||
Gruneisen::MinInternalEnergyFromDensity(const Real rho_in, Indexer_t &&lambda) const { | |||
const Real rho = std::min(rho_in, _rho_max); | |||
// const Real rho = std::min(rho_in, _rho_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are unused var warnings but I want to keep the vars here as a comment, for if/when we fill this in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
@@ -370,7 +370,7 @@ MGUsup::FillEos(Real &rho, Real &temp, Real &sie, Real &press, Real &cv, Real &b | |||
} | |||
if (output & thermalqs::temperature) | |||
temp = TemperatureFromDensityInternalEnergy(rho, sie); | |||
if (output & thermalqs::specific_internal_energy) sie = sie; | |||
// if (output & thermalqs::specific_internal_energy) sie = sie; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silences warning from self-assignment. But I want to leave it in as a comment.
: _a0(a0), _a1(a1), _a2c(a2c), _a2e(a2e), _a3(a3), _b0(b0), _b1(b1), _b2c(b2c), | ||
_b2e(b2e), _b3(b3), _rho0(rho0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silences warning about initializer list ordering.
// JMM: Address sanitizer likes these named. My guess is | ||
// that the forwarding references are not being resolved | ||
// properly. | ||
Real *prho_pte = &rho_pte(tid, 0); | ||
Real *pvfrac_pte = &vfrac_pte(tid, 0); | ||
Real *psie_pte = &sie_pte(tid, 0); | ||
Real *ptemp_pte = &temp_pte(tid, 0); | ||
Real *ppress_pte = &press_pte(tid, 0); | ||
Real *pscratch = &solver_scratch(tid, 0); | ||
PTESolverRhoT<singularity::EOSAccessor_, Real *, Real **> method( | ||
npte, eos_inx, vfrac_sum, sie_v(i), &rho_pte(tid, 0), &vfrac_pte(tid, 0), | ||
&sie_pte(tid, 0), &temp_pte(tid, 0), &press_pte(tid, 0), cache, | ||
&solver_scratch(tid, 0)); | ||
npte, eos_inx, vfrac_sum, sie_v(i), prho_pte, pvfrac_pte, psie_pte, | ||
ptemp_pte, ppress_pte, cache, pscratch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and similar changes) silence the segfault. :) We may want to do more, though. See issue #438
# Optionally build these if you want to build/test the fortran | ||
# infrastructure without invoking a fortran compiler If fortran is | ||
# off, you can set this. If fortran is on, it's forced to ON and is | ||
# not set-able. | ||
cmake_dependent_option(SINGULARITY_BUILD_FORTRAN_BACKEND | ||
"Build the C++ code to which the fortran bindings bind" | ||
OFF "NOT SINGULARITY_USE_FORTRAN" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this mainly to support building with clang, because flang is still not super well supported, and I wanted to be able to run the C++ code through clang sanitizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea! Thanks for adding it
@@ -590,6 +603,7 @@ endif() | |||
|
|||
if(SINGULARITY_BUILD_TESTS) | |||
include(CTest) | |||
enable_testing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how tests ever ran without this. Did cmake get stricter at some point?
constexpr Real GPa = 1.0e10; | ||
constexpr Real MJ_per_kg = 1.0e10; | ||
|
||
#ifdef SINGULARITY_TEST_SESAME | ||
#ifdef SINGULARITY_USE_SPINER_WITH_HDF5 | ||
|
||
constexpr Real GPa = 1.0e10; | ||
constexpr Real MJ_per_kg = 1.0e10; | ||
|
||
using singularity::DavisProducts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silences unused variable warning when this test is disabled.
int N_; | ||
Real *P_; | ||
Real *rho_; | ||
Real *sie_; | ||
int N_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silences initializer list warning
utils/kokkos
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I snuck in submodule updates to more recent versions of kokkos, ports-of-call, and kokkos-kernels. Only impacts submodule mode, but the versions we were pinned to were pretty old. This also silences some warnings from ports of call. @pdmullen this should fix the warnings you guys were seeing in artemis. :)
.github/workflows/sanitizer.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test will fail if the sanitizer is unhappy OR if there are any warnings in the set of code the code tests. I want to be stricter about this going forward, as I think it will help us catch problems like this in the future.
@dholladay00 @jhp-lanl @rbberger @mauneyc-LANL this is ready for review. @jdolence this resolves the HIP segfault. I don't think it was going to be an issue for riot, but now we can be confident. @pdmullen this should silence the warnings you were seeing in artemis. |
All tests now passing on re-git including the HIP tests that were previously failing. |
OK also added checks for compiler warnings from GCC, which caught a few more things, now cleaned up. |
-DSINGULARITY_FORCE_SUBMODULE_MODE=ON \ | ||
-DSINGULARITY_USE_KOKKOS=ON \ | ||
.. | ||
make -j4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to run the code, this is just to check for warnings.
static struct ThermalUnitsInit { | ||
[[maybe_unused]] static struct ThermalUnitsInit { | ||
} thermal_units_init_tag; | ||
static struct LengthTimeUnitsInit { | ||
[[maybe_unused]] static struct LengthTimeUnitsInit { | ||
} length_time_units_init_tag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang didn't care that these tags might be unused, maybe because it could tell they were for tag dispatch. But gcc did care. [maybe_unused]
is a C++17 feature, so we can safely use it and assume it's compiler agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with some minor comments/requests.
# Optionally build these if you want to build/test the fortran | ||
# infrastructure without invoking a fortran compiler If fortran is | ||
# off, you can set this. If fortran is on, it's forced to ON and is | ||
# not set-able. | ||
cmake_dependent_option(SINGULARITY_BUILD_FORTRAN_BACKEND | ||
"Build the C++ code to which the fortran bindings bind" | ||
OFF "NOT SINGULARITY_USE_FORTRAN" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea! Thanks for adding it
Merging this now, but still open to feedback if anyone else reviews. I can submit an MR later. |
Also thanks for the review @jhp-lanl ! |
PR Summary
I ran singularity-eos through the clang address and undefined behavior sanitizer on my laptop. It found a few minor things like undefined variables, but it also found the source of the HIP segfault. The issue was that the constructors for the pte solvers use forwarding references, e.g.,
but the accessors in
get_sg_eos
all call the constructor by passing in accessors to 2d views, e.g.,&rho_pte(tid, 0)
which are located from the unique token logic used to prevent over-allocated scratch memory (for example).The problem is that the compiler is capturing these in the function call as temprorary variables, but the perfect forwarding references are then capturing them by reference, not value, meaning they go out of scope before the solver is actually used.
The issue is resolved by explicitly declaring variables for each of the accessors passed in. e.g.,
We may wish to remove perfect forwarding down the line and instead insist these constructors capture by value. See #438 .
Here's the build command I used.
cmake -DSINGULARITY_FORCE_SUBMODULE_MODE=ON -DSINGULARITY_USE_KOKKOS=ON -DSINGULARITY_BUILD_TESTS=ON -DCMAKE_CXX_COMPILER=clang++-15 -DCMAKE_CXX_FLAGS="-fsanitize=address -fsanitize=undefined -Wall -fno-omit-frame-pointer" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_BUILD_TYPE=Debug ..
I added this to the github-workflows as a test. In the same test, I also add a strict check for warnings. If there are any warnings in the subset of code built, the test fails. Naturally, I also went through the code and eliminated said warnings. :)
PR Checklist
make format
command after configuring withcmake
.If preparing for a new release, in addition please check the following:
when='@main'
dependencies are updated to the release version in the package.py